The following analysis is based on a publicly available Rebrickable database containing information about LEGO parts, colors, types of parts, as well as sets and themes, containing the year of production for each set, and much more. The goal of the analysis is to visualize and draw conclusions when it comes to relations between the aforementioned attributes with a special emphasis on the time data analysis as a means of understanding how the previous changes, trends and relationships between available attributes, influenced future company decisions and as such may at some point allow drawing further conclusions in the future. Lastly, emphasis has also been placed on analyzing unique and rare LEGO parts to understand what themes may be particularly likely to contain them and in effect what the collectors should set their sights on.
library(data.table)
library(tidyr)
library(ggplot2)
library(dplyr)
library(treemapify)
library(ggridges)
library(DT)
library(plotly)
colors <- fread("rebrickable/colors.csv.gz")
inventory_parts <- fread("rebrickable/inventory_parts.csv.gz")
inventories <- fread("rebrickable/inventories.csv.gz")
sets <- fread("rebrickable/sets.csv.gz")
themes <- fread("rebrickable/themes.csv.gz")
part_categories <- fread("rebrickable/part_categories.csv.gz")
parts <- fread("rebrickable/parts.csv.gz")
background_color <- "#2d2d2d"
font_color <- "white"
tick_color <- "#DDDDDD"
title_size <- 20
label_size <- 16
tick_size <- 12
grid_size <- 0.5
orange <- "#FF7E67"
coral <- "#FFFFC0"
teal <- "#62e2e2"
selected_years <- seq(1940, 2025, by=10)
In the first section we will explore data regarding themes.
parents <- themes %>% filter(is.na(parent_id)) %>% group_by(id, name)
find_parent <- function(parent_id) {
if (is.na(parent_id)) {
"parent"
} else {
prev_id <- parent_id
while(!is.na(parent_id)){
prev_id <- parent_id
parent_id <- themes$parent_id[themes$id==parent_id]
}
parents$name[parents$id==prev_id]
}
}
set_themes_children <- rename(sets, set_name=name) %>%
merge(rename(themes, theme_name=name), by.x="theme_id", by.y="id")
set_themes_children$parent_name <- lapply(set_themes_children$parent_id, find_parent)
set_themes_children <- set_themes_children %>%
mutate(parent_name = ifelse(parent_name=="parent", theme_name, parent_name))
ridge_years <- seq(1940, 2020, by=10)
set_themes_children %>%
ggplot(aes(x=year, y=as.character(parent_name), fill=as.character(parent_name))) +
geom_density_ridges(alpha=0.6, bandwidth=0.8, color="#CCCCCC") +
theme_ridges() +
scale_x_continuous(breaks = ridge_years, limits = c(1945, 2025)) +
ggtitle("Distribution of Themed Set Releases in Years 1949-2023") +
theme(legend.position = "none",
plot.background = element_rect(fill=background_color),
plot.title = element_text(size=title_size, colour = font_color, hjust=1.7),
axis.title.x = element_text(size=label_size, colour = font_color),
axis.title.y = element_blank(),
axis.text.x = element_text(size=tick_size, color = tick_color),
axis.text.y = element_text(size=tick_size, color = tick_color),
panel.grid.major.x = element_line(color = rgb(235, 235, 235, 100, maxColorValue = 255)),
panel.background = element_rect(fill=background_color)) +
annotate("text", x = ridge_years+1.5, y = 141.75, label = ridge_years, angle = 90, size=4,
color="white") +
annotate("text", x = ridge_years+1.5, y = 120, label = ridge_years, angle = 90, size=4,
color="white", alpha=0.6) +
annotate("text", x = ridge_years+1.5, y = 100, label = ridge_years, angle = 90, size=4,
color="white", alpha=0.6) +
annotate("text", x = ridge_years+1.5, y = 80, label = ridge_years, angle = 90, size=4,
color="white", alpha=0.6) +
annotate("text", x = ridge_years+1.5, y = 60, label = ridge_years, angle = 90, size=4,
color="white", alpha=0.6) +
annotate("text", x = ridge_years+1.5, y = 40, label = ridge_years, angle = 90, size=4,
color="white", alpha=0.6) +
annotate("text", x = ridge_years+1.5, y = 20, label = ridge_years, angle = 90, size=4,
color="white", alpha=0.6)
This density ridge plot shows the distribution of releases of LEGO themes over the years. For the analysis parent themes were chosen and the counts were summed together with themes that are their subordinates. The themes are alphabetically sorted from the bottom of the plot.
The plot allows to approximate when the production of certain themes began and when it ended as well as when the peaks of popularity occurred. Some of the themes received only a couple of sets (such as The Powerpuff Girls) and as such their trends are not visible on the plot.
merged_df <- merge(sets, themes, by.x = "theme_id", by.y = "id", all.x = TRUE)
star_years <- seq(1950, 2020, by=10)
popular_themes <- merged_df %>%
count(name.y, sort = TRUE) %>%
head(5)
popular_themes <- popular_themes$name.y[1:5] # use the top 10 popular themes
merged_df_filtered <- merged_df[merged_df$name.y %in% popular_themes, ]
theme_year_counts <- merged_df_filtered %>%
group_by(name.y, year) %>%
summarise(num_sets = n()) %>%
ungroup()
theme_year_counts_cumsum <- theme_year_counts %>%
group_by(name.y) %>%
mutate(cum_sum = cumsum(num_sets))
#Plot 1->CumSum of Sets for Chosen Themes
ggplot(data = theme_year_counts_cumsum, aes(x = year, y = cum_sum, color = name.y)) +
geom_line(size = 1) +
scale_x_continuous(breaks = star_years) +
scale_color_manual(values=c("#FF4040", "#00FF40", "#FFFF00", "#FF8000", "#00FFFF")) +
labs(x = "Year", y = "Number of Sets", color = "Theme") +
ggtitle("Cumulative Sum of Sets for Chosen Themes") +
theme(plot.background = element_rect(fill=background_color),
plot.title=element_text(size=title_size, colour = font_color, hjust=0.4),
axis.title.x = element_text(size=label_size, colour = font_color),
axis.title.y = element_text(size=label_size, color = font_color),
axis.text.x = element_text(size=tick_size, color = tick_color),
axis.text.y = element_text(size=tick_size, color = tick_color),
panel.grid.minor = element_blank(),
panel.background = element_rect(fill=background_color),
legend.background = element_rect(fill=background_color),
legend.text = element_text(size=tick_size, color = tick_color),
legend.title = element_text(size=label_size, color = font_color),
legend.key = element_rect(fill=background_color))
The plot shows the cumulative sum for the number of sets for selected themes over the years. The chosen themes were based on their popularity, which was determined by the total number of produced sets. For this visualization, the top 5 themes with the highest number of produced sets were chosen.
As seen on the plot, the “Books” theme is the oldest among the presented themes and experienced slow growth until around the year 2010 when there was a sudden increase. Many of the presented themes experienced quick growth in the 2000s. “Technic” had the most sets created for it until relatively recently when the “Star Wars” theme’s explosive growth surpassed it. This illustrates the importance of the “Star Wars” brand for LEGO and how collaborating with a well-known brand can influence the company’s future.
star_wars_counts <- theme_year_counts %>% filter(name.y == "Star Wars")
film_releases=c(1999,2002,2005,2015,2017,2019)
film_colors=c("#ff3333", "orange", "green", "#FF40FF", "pink", "yellow")
film_labels=c("The Phantom Menace", "Attack of the Clones", "Revenge of the Sith",
"The Force Awakens", "The Last Jedi", "The Rise of Skywalker")
ggplot(data = star_wars_counts, aes(x = year, y = num_sets)) +
geom_line(size = 1, color=teal) +
scale_x_continuous(breaks = seq(1998, 2023, by=5)) +
labs(x = "Year", y = "Number of Sets") +
ggtitle("Number of Produced Star Wars Sets") +
geom_vline(xintercept = film_releases, color = film_colors, linetype="dotted") +
annotate("text", x = film_releases[1:3]+0.2, y = max(star_wars_counts$num_sets)*0.4,
label = film_labels[1:3], color = film_colors[1:3], angle = 90, hjust = -0.2, size=3.5) +
annotate("text", x = film_releases[4:6]+0.2, y = 0, label = film_labels[4:6],
color = film_colors[4:6], angle = 90, hjust = c(-0.4,-0.6,-0.4), size=3.5) +
theme(legend.position = "none",
plot.background = element_rect(fill=background_color),
plot.title=element_text(size=title_size, colour = font_color, hjust=0.4),
axis.title.x = element_text(size=label_size, colour = font_color),
axis.title.y = element_text(size=label_size, color = font_color),
axis.text.x = element_text(size=tick_size, color = tick_color),
axis.text.y = element_text(size=tick_size, color = tick_color),
panel.grid.minor = element_blank(),
panel.background = element_rect(fill=background_color))
The previous visualization has shown the significance of the “Star Wars” brand and theme for the LEGO company. It would now be beneficial to take a closer and more thorough look at the growth of the number of sets for the “Star Wars” theme.
The plot displays the number of sets produced in a given year for the “Star Wars” theme. Additional dotted, horizontal lines were included to show the release dates for both the main prequel trilogy and the sequel trilogy. The original trilogy’s release dates were not shown since they predate the release of any “Star Wars” themed LEGO sets.
Regarding the general trend, it can be seen that the number of produced sets increased considerably over the years. However, recently, the number of produced “Star Wars” themed sets has been declining steeply. Interestingly, 3 out of the 6 shown movie releases (Attack of the Clones, Revenge of the Sith, The Force Awakens) coincided with local maxima for the number of sets produced. Additionally, the release of “Phantom Menace” coincided with the start of collaboration between LEGO and Star Wars in producing sets, with a significant spike visible within a year of the release. However, the release of “Rise of Skywalker” did not cause a significant increase in the number of produced themed sets, neither before nor after the premiere.
set_themes <- sets %>%
rename(set_name=name) %>%
merge(rename(themes, theme_name=name), by.x = "theme_id", by.y = "id")
set_theme_inventories <- inventories %>% merge(set_themes, by="set_num")
set_theme_inventories_parts <- inventory_parts %>% merge(set_theme_inventories, by.x="inventory_id", by.y="id")
set_theme_inventories_parts_colors <- colors %>% rename(color_name=name) %>%
merge(set_theme_inventories_parts, by.x="id", by.y="color_id") %>% rename(color_id=id)
theme_colors <- set_theme_inventories_parts_colors %>% filter(is.na(parent_id)) %>%
select(one_of(c("color_name","rgb","theme_name","theme_id","color_id")))
grouped_theme_colors <- theme_colors %>% group_by(theme_name, color_name, color_id, rgb) %>%
summarise(count=n(), .groups="keep") %>% arrange(theme_name, desc(count))
prettyTable <- function(table_df, round_columns_func=is.numeric, round_digits=0) {
DT::datatable(table_df, style="bootstrap", filter = "top", rownames = FALSE, extensions = "Buttons",
options = list(dom = 'Bfrtip', buttons = c('copy', 'csv', 'excel', 'pdf', 'print'))) %>%
formatRound(unlist(lapply(table_df, round_columns_func)), round_digits)
}
prettyTable(grouped_theme_colors)
The table allows for searching through colors used in parts of parent themes making it possible to discover which colors define the themes. The table can be copied, saved as csv, excel or pdf file or printed. It includes a search bar for looking up specific values, allows for ordering the columns and limiting the part count.
This section is focused on the measurements of parts from the inventory_parts dataset.
Data organization:
merged_df <- merge(sets, themes, by.x = "theme_id", by.y = "id", all.x = TRUE)
merged_df_inventories <- merge(inventories, merged_df, by = "set_num", all.x = TRUE)
inventory_parts <- inventory_parts %>% rename(id = inventory_id)
part_categories <- inventory_parts %>% rename(part_cat_id = id)
merged_df_inventory_parts <- merge(merged_df_inventories, inventory_parts, by="id")
merged_df_inventory_parts_final <- merge(merged_df_inventory_parts, parts, by = "part_num")
parts_count <- merged_df_inventory_parts_final %>% count(part_num, name = "count")
merged_counts <- merge(merged_df_inventory_parts_final, parts_count, by = "part_num", all.x=T)
result <- merged_counts %>%
filter(count == 1)
result <- result %>%
count(year)
result <- na.omit(result)
ggplot(result, aes(x = year, y = n)) +
geom_line(color=teal, size=1) +
labs(x = "Year", y = "Number of Unique LEGO Parts") +
ggtitle("Number of Unique LEGO Blocks per Year") +
scale_x_continuous(breaks=selected_years) +
theme(plot.background = element_rect(fill=background_color),
plot.title=element_text(size=title_size, colour = font_color, hjust=0.4),
axis.title.x = element_text(size=label_size, colour = font_color),
axis.title.y = element_text(size=label_size, color = font_color),
axis.text.x = element_text(size=tick_size, color = tick_color),
axis.text.y = element_text(size=tick_size, color = tick_color),
panel.grid.minor = element_blank(),
panel.background = element_rect(fill=background_color)) +
annotate(geom="text", x=1963, y=930, label="Introduction of Modulex bricks", size=4, color=font_color)
The plot shows the number of unique LEGO blocks produced in a given year. Unique parts in this case are defined as all LEGO parts that appear only in one set.
The number of unique LEGO blocks appears to have increased throughout the years, which makes sense, as with time more sets are produced in a year by the company. As such, more unique blocks may also be produced. However, the growth is not smooth, and there appear to be significant spikes throughout the years in the number of unique blocks produced.
result_cumsum <- result %>%
arrange(year) %>%
mutate(n_cumsum = cumsum(n))
ggplot(result_cumsum, aes(x = year, y = n_cumsum)) +
geom_line(color=teal, size=1) +
labs(x = "Year", y = "Number of Rare LEGO Parts") +
ggtitle("Cumulative Sum of Rare LEGO Parts through Years") +
scale_x_continuous(breaks=selected_years) +
theme(plot.background = element_rect(fill=background_color),
plot.title=element_text(size=title_size, colour = font_color, hjust=0.4),
axis.title.x = element_text(size=label_size, colour = font_color),
axis.title.y = element_text(size=label_size, color = font_color),
axis.text.x = element_text(size=tick_size, color = tick_color),
axis.text.y = element_text(size=tick_size, color = tick_color),
panel.grid.minor = element_blank(),
panel.background = element_rect(fill=background_color))
The plot shows the cumulative sum of unique LEGO blocks produced in a given year. Unique parts in this case are defined as all LEGO parts that appear only in one set.
The graph shows how the total number of unique blocks increased through the years. Interestingly enough, the growth appears to be about exponential in nature although not completely smooth, with significant increases in a given year being noticeable on the plot’s line.
merged_df <- merge(sets, themes, by.x = "theme_id", by.y = "id", all.x = TRUE)
avg_parts_per_year <- merged_df %>%
group_by(year) %>%
summarize(avg_parts = mean(num_parts))
ggplot(data = avg_parts_per_year, aes(x = year, y = avg_parts)) +
geom_line(size = 1, color=teal) +
labs(x = "Year", y = "Average Number of Parts per Set") +
ggtitle("Average Number of Parts per Set per Year") +
scale_x_continuous(breaks=selected_years) +
theme(plot.background = element_rect(fill=background_color),
plot.title=element_text(size=title_size, colour = font_color, hjust = 0.5),
axis.title.x = element_text(size=label_size, colour = font_color),
axis.title.y = element_text(size=label_size, colour = font_color),
axis.text = element_text(size=tick_size, color = tick_color),
panel.background = element_rect(fill=background_color),
panel.grid = element_line(color="#CCCCCC"),
panel.grid.minor = element_blank()) +
annotate(geom="text", x=1960, y=190, label="Kindergarten LEGO Set", size=4, color=font_color)
The plot shows the average number of parts in a set for each given year.
The average number of parts in a set has increased significantly throughout the years, however, there have been significant spikes and fluctuations, as can be seen in the graph. For example, the average number of parts in a LEGO set was much higher in the year 1960 than in the years before or after.
A section devoted to plots relating data regarding parts and themes
result <- merged_counts
theme_counts <- result %>%
group_by(name.y) %>%
summarize(num_parts = mean(num_parts), count = sum(count)) %>%
arrange(desc(count))
theme_counts <- head(arrange(theme_counts,desc(count)),100)
theme_counts <- theme_counts %>% rename('Average Number of Parts' = num_parts) %>%
rename('Number of Rare LEGO Parts'=count)
ggplotly(ggplot(theme_counts, aes(x = theme_counts$'Average Number of Parts',
y = theme_counts$'Number of Rare LEGO Parts',
text = name.y)) +
geom_point(color=teal) +
labs(x = "Average Number of Parts", y = "Number of Rare LEGO Parts") +
ggtitle("Number of Rare LEGO Parts vs Average Number of Parts for a Theme") +
theme(plot.background = element_rect(fill=background_color),
plot.title=element_text(size=title_size-1, colour = font_color, hjust = 1),
axis.title.x = element_text(size=label_size, colour = font_color),
axis.title.y = element_text(size=label_size, colour = font_color),
axis.text = element_text(size=tick_size, color = tick_color),
panel.background = element_rect(fill=background_color)))
This interactive scatterplot shows the number of rare LEGO parts on the y-axis and the average number of parts in a set for a given theme. Meaning that each point is a theme with a given number of rare LEGO parts and given average number of parts for a set. The rare LEGO parts on this plot refer to LEGO parts/blocks that appear in only one of the sets across all themes. Using this plot, collectors can see which themes are most worth buying when it comes to getting the highest number of unique parts among non-unique LEGO parts.
result_2 <- merged_counts %>%
filter(count == 1)
result_2<-result_2 %>%
count(name.y)
result_2<-na.omit(result_2)
result_2<-arrange(result_2, desc(n))
result_2 <- filter(result_2, name.y != "Database Sets")
result_2<-head(result_2,20)
# Excluding Database Sets
ggplot(result_2, aes(x = n, y = reorder(name.y, n))) +
geom_col(fill = coral) +
labs(title = "20 Themes with the Highest Amount of Unique LEGO Parts",
x = "Number of Unique Parts",
y = "Theme") +
theme(plot.background = element_rect(fill=background_color),
plot.title=element_text(size=title_size, colour = font_color, hjust = 4),
axis.title.x = element_text(size=label_size, colour = font_color, hjust=0.2),
axis.title.y = element_text(size=label_size, colour = font_color),
axis.text = element_text(size=tick_size, color = tick_color),
panel.grid = element_line(color="#DDDDDD"),
panel.background = element_rect(fill=background_color))
The horizontal barchart shows the names of 20 themes with the highest number of unique LEGO parts ordered according to the decreasing number of unique LEGO parts. Unique parts in this case are defined as all LEGO parts that appear only in one set.
Themes with highest number of unique parts include, among many others, themes like: “Duplo and Explore”, “Ninjago” and “Star Wars”. It appears that themes that could be considered to be visually “unique” are also more likely be chosen as a set with highest number of unique parts. Some themes may also end up gaining advantage in this category thanks to them having more sets overall, like the “Star Wars” theme.
theme_part_counts <- merged_df %>%
group_by(name.y) %>%
summarize(avg_parts = mean(num_parts))
# Append the maximum number of parts found in a set with a given theme
theme_part_counts <- theme_part_counts %>%
left_join(merged_df %>%
group_by(name.y) %>%
summarize(max_parts = max(num_parts)),
by = "name.y")
top_20_avg_parts <- head(theme_part_counts %>% arrange(desc(avg_parts)),20)
top_20_max_parts <- head(theme_part_counts %>% arrange(desc(max_parts)),20)
# Second Plot
# Version A with Dots
ggplot(top_20_avg_parts, aes(x = reorder(name.y, avg_parts), y = avg_parts)) +
geom_bar(stat = "identity", fill=coral) +
coord_flip() +
labs(x = "Theme", y = "Average Number of Parts") +
geom_point(aes(reorder(name.y, avg_parts),y=max_parts), color=background_color,
fill= coral, shape=21, size=3, stroke=1.5) +
ggtitle("Average and Maximum Number of Parts for Chosen Themes") +
theme(plot.background = element_rect(fill=background_color),
plot.title=element_text(size=title_size, colour = font_color, hjust = 4),
axis.title.x = element_text(size=label_size, colour = font_color, hjust=0.2),
axis.title.y = element_text(size=label_size, colour = font_color),
axis.text = element_text(size=tick_size, color = tick_color),
panel.grid = element_line(color="#DDDDDD"),
panel.background = element_rect(fill=background_color))
The horizontal bar chart shows the average and maximum number of parts in a set for a given theme. The themes on the plot are ordered according to decreasing value of average number of parts for a set of that theme. The chosen themes in this case refer to themes with the highest average number of parts per set.
ggplot(top_20_avg_parts, aes(x = reorder(name.y, avg_parts), y = avg_parts)) +
geom_bar(stat = "identity", fill=coral) +
coord_flip() +
labs(x = "Theme", y = "Average Number of Parts in a Set") +
ggtitle("Average Number of Parts in Themes for Chosen Themes") +
theme(plot.background = element_rect(fill=background_color),
plot.title=element_text(size=title_size, colour = font_color, hjust = -8),
axis.title.x = element_text(size=label_size, colour = font_color, hjust=0.2),
axis.title.y = element_text(size=label_size, colour = font_color),
axis.text = element_text(size=tick_size, color = tick_color),
panel.background = element_rect(fill=background_color))
The horizontal bar chart shows the average number of parts for the 20 themes with the highest average number of parts in a set. The themes on the plot are ordered according to decreasing value of the average number of parts for a set of that theme.
In this section, data regarding sets is presented.
num_sets_per_year <- table(sets$year)
num_sets_per_year_df <- as.data.frame(num_sets_per_year)
colnames(num_sets_per_year_df) <- c("year", "num_sets")
selected_years <- c(1950, 1960, 1970, 1980, 1990, 2000, 2010, 2020)
num_sets_per_year_df_new <- head(num_sets_per_year_df,-1)
exp_model <- lm(log(num_sets_per_year_df_new$num_sets)~ as.integer(num_sets_per_year_df_new$year))
ggplot(num_sets_per_year_df_new, aes(x = year, y = num_sets, group=1)) +
geom_line(color = teal, size=1) +
geom_line(stat="smooth", method = "lm", formula = y ~ exp(coef(exp_model)[1] + coef(exp_model)[2] * x),
se = FALSE, color=orange, alpha=0.75, size=1) +
scale_x_discrete(breaks = selected_years) +
xlab("Year") +
ylab("Number of Sets") +
ggtitle("Number of Sets per Year") +
theme(plot.background = element_rect(fill=background_color),
plot.title=element_text(size=title_size, colour = font_color, hjust=0.4),
axis.title.x = element_text(size=label_size, colour = font_color),
axis.title.y = element_text(size=label_size, color = font_color),
axis.text.x = element_text(size=tick_size, color = tick_color),
axis.text.y = element_text(size=tick_size, color = tick_color),
panel.background = element_rect(fill=background_color))
The plot shows the total number of sets produced across all themes for each year. An exponential regression line has been added to show the observed trend of the growth of the number of produced sets in a year. The plot shows only the data up to 2022, as the information about the number of produced sets for 2023 is incomplete and could influence the shape of the regression line.
The increase appears to be exponential in nature, with the drop at the end caused by the fact that the data for 2023 only includes sets produced so far into the year. It’s likely that many more sets will still be produced until the end of the year. The exponential regression line predicts further growth in the number of produced sets for the year 2023.
In the next section we will explore the colors data.
color_id_sums <- inventory_parts %>% group_by(color_id) %>% summarise(sum = sum(quantity))
color_sums <- inner_join(x=colors, y=color_id_sums, by=join_by(id==color_id))
color_sums <- color_sums %>% mutate(
r = paste("0x", substr(rgb, start=1, stop=2), sep=''),
g = paste("0x", substr(rgb, start=3, stop=4), sep=''),
b = paste("0x", substr(rgb, start=5, stop=6), sep='')
)
top_colors <- color_sums %>% arrange(sum) %>% tail(top_n) %>% mutate(order = seq.int(top_n))
top_colors %>% ggplot(aes(area=sum, fill=factor(order), label=rgb)) +
geom_treemap() +
geom_treemap_text(min.size = 8, size=20, color=c(rep("white", nrow(top_colors)-3), 1, rep("white", 2))) +
scale_fill_manual(values=rgb(top_colors$r, top_colors$g, top_colors$b, maxColorValue=255)) +
ggtitle("Most Frequent Colors of Parts") +
theme(legend.position="none", plot.background = element_rect(fill=background_color),
plot.title=element_text(hjust=.5, size=title_size, colour = font_color))
This treemap is supposed to represent the proportions of a set amount (100) of most popular colors based on parts in the inventory. Colors used in the treemap represent the actual colors of parts, their rgb values are annotated on each rectangle.
On the figure we can observe that the most common colors used are white, grey and black. A characteristic tone of yellow, blue and red are also visible in the proportion.
trans_count <- colors %>% group_by(is_trans) %>% summarise(count = n()) %>%
mutate(
proportions = paste(round(count/sum(count),2)*100, "%", sep=''),
is_transparent = ifelse(is_trans == 'f', "No", "Yes"))
# No legend
trans_count %>%
ggplot(aes(x="", y=count, fill=is_transparent, label=proportions)) +
geom_bar(stat="identity", width=1) +
coord_polar("y", start = 0) +
ggtitle("Proportion of Transparent Colors") +
theme_void() +
geom_text(nudge_y = c(-45,-19), nudge_x = c(0.0025,0.2), size=9, color=font_color, fontface="bold") +
theme(plot.title = element_text(hjust=0.3, vjust=-1.5, color=font_color, size=title_size),
plot.background = element_rect(fill=background_color),
axis.ticks.length = unit(0, "pt"),
legend.position = "none") +
annotate(geom="text", x=1.9, y=20, label="Transparent", size=7, color=font_color) +
annotate(geom="text", x=1.9, y=125, label="Non-transparent", size=7, color=font_color)
This simple piechart represents the proportion of transparent and non-transparent colors in the colors dataset.
In this data analysis, multiple different aspects of the data have been considered, particularly pertaining to how the LEGO landscape has changed through sets and themes. It has been noted that throughout the years, many themes have been introduced, and in some aspects, the popularity of those themes, as reflected through the number of produced sets, coincided with real-world events that influenced the popularity of brands the themes were based on. Additionally, when viewing time-series data as in this analysis, real-world events can be seen reflected in the data. For example, the introduction of “Modulex bricks” had a significant impact on the number of produced unique LEGO parts that year. Furthermore, the continued, exponential increase in the number of produced sets may reflect the continued growth of the LEGO company, at least theoretically.
Secondly, the relation between themes and unique parts was explored, giving LEGO collectors a possible view of which themes may be worth investing in to acquire the most unique LEGO collection. Lastly, the relation between colors present in LEGO parts was also visualized to show what sorts of color arrangements a potential buyer may expect from the average LEGO set.
The limiting factor and things worth considering in future analyses may include data pertaining to the sales of LEGO sets, with the price of the individual set as well as sales of individual sets being included. Such data would allow the analysis to further dive into the relations between past popularity of sets/themes, real-world events, and future decisions regarding sets and themes production. However, due to the sensitivity of such data, it is understandable why such data would not be published.
In conclusion, the analysis allowed the readers to gain insight into the past trends present in LEGO, as well as visualize some interesting relations between sets and themes, unique parts and themes, as well as understand the most common colors used when creating LEGO parts.